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Amdt. dated May 1 3» 2005 

Reply to Office Action of April 4, 2005 

Amendments to the Claims: 

This listing of claims will replace all prior versions, and listings, of claims in the 
application: 
Listing of Claims: 

Please amend claim 1 as follows: 

1 . (currently amended): A method for recognizin g reformulating th e otruoturo o fr aw data 
appearing in a delineated table region ifiof an electronic documen t into a table structure, 
comprising the steps of: 

a) inputtin g reading tabledraw dat a, said raw data spatially arranged in a delineated table 
region of an electronic docimient, said tabtedraw data, as inptt^ead, lacking hierarchical 
arrangement sufficient to enable a^logical query of said t^led raw data based on said spatial 
arrangement; 

b) creating a binary tree using a hierarchical clustering of a plurality of words included in 
said tabl e r e gion r aw data ; 

c) segregating a plurality of table-colxmm s from the raw data using a breadth-first 
traversal algorithm; 

d) identifying colvmm headers, if any, from the pluraHtv of columns using a first heuristic 
algorithm; 

e) identifying row headers, if any, from the column headers using a second heuristic 
algorithm; and 
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f) segregating at least one taWe-row from the raw data using a row determination 
algorithm : and 

z) storing the plurality of colimms and the at least one row into a table structure . 

2. (previously presented): The method according to claim 1, wherein the hierarchical 
clustering further comprises the steps of: 

a) generating a plurality of leaf clusters; 

b) calculating a plurality of inter-cluster distances for each one of a plurality of clusters; 

c) merging the two clusters having a minimum inter-cluster distance calculated in b) to 
create a new cluster; 

d) creating an interior node of the binary tree with the said two clusters as its children; 

and 

e) repeating steps b) through d) imtil there is only one cluster left without a parent. 

3. (original): The method according to claim 2, wherein each one of the plurality of leaf 
clusters comprises a single one of said plurahty of words. 

4. (original): The method according to claim 2, wherein the inter-cluster distance is 
detomined by an algorithm comprising the steps of: 

a) calculating a position vector (span) for each one of the plurality of words, said span 
comprising the starting and ending horizontal position of each said word; and 
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b) detenrnning a unique separation distance between each unique cluster of the plurality 
of clusters and each one of the other clusters in the plurality of clusters by: 

1) using positional vector subtraction of the individual cluster positional vectors 
when each cluster is comprised of a single word; and 

2) when at least one of the clusters is a merged cluster, computing the average 
separation distance of all the unique inter-cluster separation distances comprising the cluster pair. 

5. (original): The method according to claim 4, wherein the distance comprises one fix>m 
the group consisting of geometric, syntactic, and semantic. 

6. (previously presented): The method according to claim 1 , wherein the breadth- first 
traversal algorithm comprises the steps of: 

a) beginning at the root node, spUt the node into two nodes and determine whether the 
two split nodes can be split into subordinate nodes based on a spacing decision criteria; 

b) if a node cannot be split, move the node into a storage buffer, else repeat step (a for 
any remaining nodes; and 

c) when all nodes have been moved into the storage buffer, the columns are defined as the 
nodes in the storage buffer. 

i 

7. (original): The method according to claim 6, wherein the spacing decision criteria 
comprises the splitting of the node if and only if: 
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a) the node is the root node; 

b) ifg>G;or 

c) if g < G and g/nig > a 

where g is a gap between clustei^, G is a predetermined constant, rog is an average gap between 
adjacent pairs of already identified columns, and a is a number between 0 and L 

8. (original): The method according to claim 6, additionally including the step of sorting 
columns according to a starting position of each one of the plurality of columns, 

9. (original): The method according to claim 8, additionally including the step of 
adjusting the upper boundary of the table region by performing a consistency test. 

10. (original): The method according to claim 9, wherein the consistency test comprises 
the steps of: 

a) calculating a predominate string type for each one of the pluraUty of columns included 
in the table region (column type); 

b) starting at a predetermined number of table lines below the start of the table region, 
calculating a unique string type for each one of the plurality of words in said table line (word 

type); 

c) comparing each one of the plurality of word types with the associated column type; 

d) generating a plurality of metrics associated with the result of said comparisons; and 
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e) if a majority of said metrics are true, identifying the current line as the bottom line of 
the box region and ending the consistency test, or else moving up one table line and repeating 
steps c) and d). 

11. (original): The method according to claim 1, wherein the first heuristic algorithm for 
identifying column headers comprises the steps of: 

a) dividing each table line into a plurality of unique separable strings; 

b) creating a hierarchical tree having a box as the root, each one of the plurality of table 
columns as the leaves, and higher level headers as intermediate nodes of said tree; 

c) calculating a joint span for each one of the plurality of separable strings using the 
equation 

pj,;, = (min (Si), max (ej)) i = 1 to « 

d) comparing the boundaries of each one of the plurality of joint spans with the 
boundaries of each one of the plurality of table columns; and 

e) creating a list of associated columns that have overlapping boundaries in b) using a 
boundary criteria. 

12. (original): The method according to claim 1 1 , wherein each one of the; separable 

strings are delineated by a predetermined number of blank spaces. I 
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13. (original): The method according to claim 11, wherein the boundary criteria further 
comprises the steps of: 

a) associating each phrase with at least one column; and 

b) if a phrase is associated with more than one column, the subsidiary columns must 
already have its own header filled. 

14. (original): The method accordmg to claim 1, wherein the second heuristic algorithm 
for identifying row headers comprises the steps of: 

a) identifying a region as a stub region if the left-most column does not include a Column 

header; 

b) performing a semantic analysis of the data contents of the left-most column if the left- 
most column does include a column header, and 

c) detecting and storing the unique row headers for each line from steps a) and b). 

15. (original): The method according to claim 1, wherein the row determination 
algorithm comprises the steps of: 

a) defining a row separator if said row comprises a blank line; 

b) determining at least one core row, comprising: 

1) a row having a non-empty string in a stub region and having at least one other 

column; or 

2) a row having non-empty strings in a majority of the columns of tfie table; and 
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c) determining non-core rows, if any. 

16. (original): The method according to claim 15, wherein non-core rows comprise all 
rows that are not core rows. 

1 7. (original): The method according to claim 1 , additionally including the step of 
testing of said delineated tahle by creating a directed acyclic graph. 

18. (previously presented); The method according to claim 17, additionally including the 
step of testing of said delineated table by logically probing said directed acyclic graph. 

19. (original): The method according to claim 18, wherein the step of testing said table 
comprises the comparison of responses from a plurality of logical tests conducted on said graph 
with an associated plurality of predetermined reference responses. 

20. (cancelled) 

21. (cancelled) 
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